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Abstract 

We examine the classical joint source-channel coding problem from the viewpoint of sta- 
tistical physics and demonstrate that in the random coding regime, the posterior probability 
distribution of the source given the channel output is dominated by source sequences, which 
exhibit a behavior that is highly parallel to that of thermal equilibrium between two systems 
of particles that exchange energy, where one system corresponds to the source and the other 
corresponds to the channel. The thermodynamical eutopics of the dual physical problem are 
analogous to conditional and unconditional Shannon entropies of the source, and so, their bal- 
ance in thermal equilibrium yields a simple formula for the mutual information between the 
source and the channel output, that is induced by the typical code in an ensemble of joint 
source-channel codes under certain conditions. We also demonstrate how our results can be 
used in applications, like the wiretap channel, and how can it be extended to multiuser scenar- 
ios, like that of the multiple access channel. 

Index Terms: joint source-channel coding, statistical physics, thermal equilibrium, mutual 
information, entropy. 



1 Introduction 

Consider the following two seemingly unrelated problems, which serve as simple special cases of a 
more general setting we study later in this paper: 

The first is an elementrary problem in statistical physics: We have two subsystems of particles 
which are brought into thermal equilibirium with each other as well as with the environment (a 



*Part of this work was carried out during a visit in Hewlett-Packard Laboratories, Palo Alto, CA, U.S.A., in the 
Summer of 2008. 
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heat bath) at temperature T. The first subsystem consists of N particles having magnetic moments 
(spins), {si}, each of which may be oriented either in the direction of an appUed external magnetic 
field B, in which case Sj = +1, or in the opposite direction, in which case Si = —1, and its energy in 
both cases is given by —SiB (up to a certain multiplicative constant, which carries the appropriate 
physical units, and which is irrelevant for the purpose of this discussion). In the second subsystem, 
there are n non-interacting particles {s^}"^^, each one of which may lie in one of two possible 
states: the state = 0, in which the particle has zero energy, and the state = 1, in which it has 
energy eg. What is the average energy possessed by each one of these subsystems in equilibrium, 
as functions of eo, T, n, N, and B? 

The second problem is in Information Theory, in particular, it is in joint source-channel coding, 
where some of the notation used is deliberately chosen to be the same as in the previous paragraph: 
A binary mcmorylcss source generates a vector s of symbols S2, ■ ■ ■ , S]\f), Si G {+1,-1}, i = 
1,... ,N, with probabilities q = Pv{Si = +1} and 1 — q = Pi{Si = —1}. This vector is encoded 
into a binary channel codeword x{s) of length n and transmitted over a binary symmetric channel 
(BSC) with a crossover probability p < 1/2, and a binary n-vector y is received at the channel 
output. Consider the posterior distribution 

^ P{s)W{y\x{s)) 

where P{s) and W{y\x) arc the probability distributions that govern the source and the chan- 
nel, respectively, as described above. Thus, clearly, P{s\y) is proportional to P{s)W{y\x{s)), or 
equivalently, \nP{s\y) is (within a term that is independent of s) given by InP(s) + \nW{y\x{s)). 
For a typical code drawn uniformly at random from the ensemble of codes, what are the relative 
contributions of the source and the channel to this quantity, for those vectors s that dominate 
P{s\y) (i.e., those that capture the vast majority of the posterior probability)? 

It turns out, as we shall sec in Section 3 below, that the two problems have virtually identical 
answers (in a sense that will be made clear and precise therein), provided that the parameters T 
and B of the first problem are related to the parameters p and q of the second problem by 

_ exp{-eo/fcr} 
^ 1 + exp{-eo/A;r} ^ ' 



2 



and 



q = 



exp{B/fcT} 
2cosh{B/kT) 



(2) 



or equivalently, 



eo = kT In 



1 — p 



(3) 



P 



and 



B = — In 
2 



1-q 



(4) 



where A; is Boltzmann's constant. 

Thermal equiUbrium between the two subsystems in the above described physical problem, dic- 
tates a certain balance between their thermodynamical entropies in order to arrive at the maximum 
total entropy (by the second law of thermodynamics) for the total energy possessed by the entire 
system at the given temperature T. As the thermodynamical entropy, in its statistical-mechanical 
definition, is intimately related to the Shannon entropy, it turns out that this equilibrium relation 
between the thermodynamical entropies of the physical problem, gives rise to an analogous relation 
between Shannon entropies pertaining to the joint source-channel coding problem in the random 
coding regime. In particular, it relates the entropy of the source to its conditional entropy given 
the channel output, whose difference is exactly the mutual information between the source and the 
channel output. The final outcome of this is a simple formula for calculating the mutual infor- 
mation rate between the input and the output of a coded system for the typical code in a given 
ensemble under certain conditions. This calculation builds strongly on the random energy model 
(REM) of spin glasses due to Derrida [3, 4, 5] and its relation to the random code ensemble (RCE) 
as described in [12]. 

Clearly, under the regime of reliable communication, the mutual information rate between the 
source and the channel output coincides with the entropy rate of the source, as the conditional 
entropy rate of the source given the channel output vanishes. Thus, the problem of calculating 
the mutual information under reliable communication conditions is easy and in fact, not quite 
interesting. The same calculation, however, when the conditions of reliable communication are not 
met, appears less trivial. But what would be the motivation for such a calculation? 

Here are just a few examples that motivate this: Consider a user that, in addition to its 
desired signal, receives also a relatively strong interfering signal (codeword), which is intended to 



3 



other users, and which comes from a codebook whose rate exceeds the capacity of this crosstalk 
channel between the interferer and our user, so that the user cannot fully decode this interference. 
Nonetheless, our user would like to learn as much as possible on the interfering signal for many 
possible reasons: For example, the user would like to learn the interference signal in order to identify 
where it originates from, or in order to estimate it and subtract it (intcrefcrence cancellation). The 
mutual information rate, call it /, between the interference signal and the channel output then 
gives some assessment concerning the quality of this estimation. For one thing, D{I), where D(-) 
is the distortion-rate function of the source, is a lower bound to the distortion in estimating this 
signal. Moreover, if the channel is Gaussian, one can calculate the exact minimum mean square 
error (MMSE) from the mutual information rate / by taking its derivative w.r.t. the signal-to-noise 
ratio (SNR) [9]. Another application comes from scenarios where the above described receiver is 
a hositle party (an eavesdropper), from which one would like to conceal information as much as 
possible. The natural setup, in this context, is that of the wiretap channel (cf. [14] as well as many 
follow-up papers), where excess channel noise beyond capacity is harnessed as an effective key 
that secures data communication. As we show in the sequel, the mutual information rate between 
the transmitted message and the eavesdropper, which suffers from this excess noise, is strongly 
related to the equivocation, which is a customary measure of security in Shannon-theoretic secrecy 
systems. 

The outline of this paper is as follows. In Section 2, we establish notation conventions. In Section 
3, we provide some basic background of elementary statistical physics, which will be needed in the 
sequel. In Section 4, we derive our main result, which is a formula for the mutual information rate. 
In Section 5, we demonstrate how it is applied for the wiretap channel, and finally, in Section 6, 
we demonstrate how our results can be extended to multiuser scenarios, like that of the multiple 
access channel. 

2 Notation Conventions 

Throughout this paper, scalar random variables (RV's) will be denoted by the capital letters, like 

S, X, and Y, their sample values will be denoted by the respective lower case letters, and their 
alphabets will be denoted by the respective calligraphic letters. A similar convention will apply to 
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random vectors and their sample values, which will be denoted with same symbols with the bold 
face font. Thus, for example, X will denote a random n-vector {Xi, . . . ,Xn), and x = {xi, ...jXn) 
is a specific vector value in X"', the n-th Cartesian power of X. Sources and channels will be 
denoted generically by the letter P,Q,M and W. Whenever clarity and unambiguity will require it, 
these letters will be subscripted by the names of the relevant RV's, following the standard notation 
conventions in the literaure, for example, Ps will denote the probability distribution of a random 
variable S, Px\y will denote the conditional probability distribution of X given Y, and so on. The 
cardinality of a finite set A will be denoted by \A\. Information theoretic quantities like entropies 
and mutual informations will be denoted following the usual conventions of the information theory 
literature. 

3 Background 

In this section, we provide a brief account of the very basic background in statistical physics, which 
is needed for this paper. 

Consider a physical system with N of particles, which can be in a variety of microscopic states 
('microstates'), defined by combinations of of physical quantities associated with these particles, 
e.g., positions, momenta, angular momenta, spins, etc., of all N particles. For each such mi- 
crostate of the system, which we shall designate by a vector s = (si, . . . , sjv), there is an associated 
energy, given by an Hamiltonian (energy function), S{s). For example, if Sj = {Pi,ri), where 
Pj is the momentum vector of particle number i and is its position vector, then classically, 
^(^) = J2i=i[^^i — I" iTT-QZi], where m is the mass of each particle, Zi is its height - one of the 
coordinates of r^, and g is the gravitation constant. 

One of the most fundamental results in statistical physics (based on the law of energy conserva- 
tion and the basic postulate that all microstates of the same energy level are equiprobable) is that 
when the system is in thermal equilibrium with its environment, the probability of a microstate s 
is given by the Boltzmann-Gibbs distribution 

where /3 = l/{kT), k being Boltmann's contant and T being temperature, and Z{P) is the normal- 
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ization constant, called the partition function, which is given by 



or 



Z{(5) = j dse-f^^^\ 



depending on whether s is discrete or continuous. The role of the partition function is by far 
deeper than just being a normalization factor, as it is actually the key quantity from which many 
macroscopic physical quantities can be derived, for example, the free energy-*^ is — ■^lnZ(/3), the 
average internal energy (i.e., the expectation of £{8) where s drawn is according (5)) is given by the 
negative derivative of lnZ(/3), the heat capacity is obtained from the second derivative, etc. One 
of the ways to obtain eq. (5), is as the maximum entropy distribution under an energy constraint 
(owing to the second law of thermodynamics) , where (3 plays the role of a Lagrange multiplier that 
controls this energy level. 

Let us define the quantity: 

^N,5{^) = {s: (e - 5/2)N < S{s) < (e + 6/2)N} , (6) 
and let us assume that the limit 

E(e) = lim lim ^^^^f) 

exists and that E(e) is a differentiable concave function. S(e) is the entropy of the physical system in 
its statistical-mechanical definition. We will see shortly that it is intimately related to the Shannon 
entropy associated with the Boltzmann-Gibbs probablity distribution P{s) defined above. 

To see why the concavity assumption makes sense, note that at least when P{s) is a product 
distribution (namely, when £{s) = Z^j^(si)), 

f Niei + N2e2 \ ^ „ . . „ , . 

since for every configuration s, where Ni < N particles have total energy A'^iei and N2 = N — 
Ni particles have total energy N2€2, the total energy of all N = Ni + N2 particles is obviously 



^The free energy means the maximum work that the system can carry out in any process of fixed temperature. 
The maximum is obtained when the process is reversible (slow, quasi-static changes in the system). 
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Niei + N2e2, but the converse is not true since there are other ways to spht the total energy of 
iViei + N2e2 between the two complementary subsets of particles. Thus, taking the logarithm of 
both sides, dividing by (A^i + N2), then taking the limits of Ni,N2 ^ 00 such that N1/N2 tends to 
a given constant, and finally, taking the limit of (5 — > 0, one readily observes that S(e) is concave. 
An argument of the same spirit can be exercised in somewhat more general situations, e.g., when 
P{s) has a Markov structure (namely, the physical system has some nearest-neighbor interactions), 
though some more caution is required. 

Denoting 



1 



V'(/3)= lim ln^exp{-/35(s)}. 



it is readily seen that 



lim lim — In 
sup[E(e)-/3e], 

e>0 



E ^nAU + V2)<5) • exp{-NPj6]} 

j>0 



(7) 



i.e., ipl-) and S(-) are a Legendre-transform pair. Since S(-) is assumed concave, then the inverse 
transform relation 

A 



holds true as well, and so the derivatives /3(e) = dE/de and e(/3) = —d^/df3 (which are the 
maximizer of [S(e) — /3e] and the minimizer of [/?€ + V'(/3)], respectively), are inverses of each other. 
It follows then that 



S(e)=V(/3)-/3- 



d(3' 



but as is readily seen, —dip/dp is the average internal energy, E{£{S)]}, where E is the expectation 
operator associated with the Boltzmann distribution. This, in turn, is readily verified to agree with 
the expression of the Shannon entropy rate H{S) of the distribution P{s), 



H(S) = lim ^,E{ln 



lim —E { In 

n — ^00 ji 



1 



P(5) 



Lexp{-/3f(S)}, 
= i,{p,)^PE{e{S)}. 



(8) 



Thus, S(e) = mS) whenever /3 and e are related by /? = /3(e), or equivalently, e = e(/3). For a 
given /?, the Boltzmann-Gibbs distribution has a sharp peak (for large A^) at the level of e(/3). We 
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then say that this value of e is the dominant energy level: Not only is it the average energy, there 
is also a strong concentration of the probability about this value as N grows without bound. The 
second law of thermodynamics asserts that in an isolated system (which does not exchange energy 
with its environment), the total entropy cannot decrease, and hence in equilibrium, it reaches its 
maximum. 

Now, suppose that we have a physical system that is composed of two subsystems, one having N 
particles with microstates {s} and Hamiltonian Si{s), and the other has n particles with microstates 
{s'} and Hamiltonian 82(3'). Let us suppose that these two subsystems are in thermal contact and 
they both reside in a very large environment (heat bath) having a fixed temperature T = l/{kp). 
The two subsystems are allowed to exchange energy with each other as well as with the heat bath. 
How is the total energy of the system split between the two subsystems? An example of two such 
subsystems was described in the first few paragraphs of the Introduction. 

The partition function of the composite system is given by 



and so the dominant energy level, as we saw before, is the one that achieves the associated nor- 
malized log-partition function ip{(3), i.e., the solution eo to the equation (iS(e)/(ie = (3, where 5](e) 
is the entropy of the combined system. Let us confine attention now to the set of combined mi- 
crostates {(s,s')} of the composite system which have energy {N + n)eo. More precisely, assume 
that the ratio n/N = A is held fixed, so {N + n)eo = N{1 + A)eo, and let us define 



Clearly, every configuration (s, s') with energy about A^(l + A)eo corresponds to some allocation of 

of the energy in one subsystem and the remaining energy in the other. Thus, defining J7^^^(e) and 
(2) 

r2^^(e) as the enumerators of microstates with energy about e in each one of the two subsystems 



Defining E(e) as lim^^o l™iv^oo[ln ;^^^(e)]/[iV(l + A)], we find, after taking logarithms of both 
sides, dividing by iV(l + A), letting N ^ 00, and then 5 — > 0, that S(eo) is given by the weighted 



zi(3) = Y,^M-mis)+£2{s')]} 



s,s' 



nN,n,si^o) = {{s, s') : N{1 + A)(eo - d/2) < £i{s) + ^2(5') < N{1 + A)(eo + S/2)} . 



individually (as defined in eq. (6)), we have, for j = 5(1 + A): 
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supremal convolution^: 

A ^ /(l + A)eo-e 



E(eo) = sup 

0<e<(l+A)eo 



Assuming that the maximum is achieved by e* € (0, (1 + A)eo), it is characterized by a vanishing 
derivative of the expression in the square brackets, i.e., the solution to the equation 



A 

where e is the unknown, and where is the derivative of Sj, i = 1,2. This equation characterizes 
the thermal equilibrium between the two subsystems and the heat bath. Now, the left-hand side 

is exactly /3. Thus, e*, the per-particle energy share of the first subsystem is the solution to the 
equation S'i(e) = /3 (or, equivalently, of eq. (9), as said), and the remaining energy per particle, 
[(1 + A)eo — e*]/A belongs to the other subsystem. 

Comment. Returning to the example that opens the Introduction, a simple calculation shows that 
the dominant energies are 

N / B\ 

H.E{Y,Si} = NBts.nh(^—j 

in the first subsystem, and 

. p/V" Q'\ raeoexp{-eo/fer} 

in the second subsystem. Thus, 

e* = Stanh ( — 
\kT 

and 

(l + A)eo-e* exp{-eo/kT} 



A l + exp{-eo/A;r}' 

In the parallel joint source-channel coding problem described in the Introduction, and to be fur- 
ther studied in a more general setting in the sequel, we have: In P{s) = In j^) ■ J^iLi Sj + const, 
and lnTy(y|a;) = (In ■ ^2^=1(^1 ® Vi) + const, with © denoting modulo 2 addition, the dom- 
inant contribution to P{s\y) comes from those {s} for which X^i^i is about its typical value 
N[{+1) ■ q + (-1) • (1 - q)] = N{2q - 1) = Artanh(5/fcT) (in analogy to the energy of the 



^The supremal convolution between two functions f{x) and g{x) is generally defined as h{x) — supJ/(.T — + 
The qualifier "weighted", in our context, refers to the fact that both functions as well as their arguments are weighted 



by 1/(1-1- A) and A/(l-hA). 
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first subsystem above, where we have used the relations (l)-(4)) and YH=i{xi ® Vi) is about 
np = nexp{— eo/A;T}/[l + exp{— eo/ZcT}] (in analogy to the energy of the second subsystem). 
Notice that these two typical contributions to the log-posterior probability agree also with the 
corresponding typical contributions, InP(so) and lnW{y\x{sQ)), of the real message sq that was 
actually transmitted. This is true regardless of whether the communication is reliable or not, i.e., 
it continues to hold no matter whether the entropy rate of the source is smaller or larger than A 
times the mutual information between the input and the output of the channel. 

Returning to the general discussion above, note that the same considerations continue to hold 

(2) 

even if one of the systems, say, the second one, has an effective negative entropy, that is, ^([(1 + 
A)eo — e*]/A) < 1, which means that for each microstate s of the first subsystem with per-particle 
energy e*, only a fraction of the compatible combined microstates {(s, s')} have noramilzed energy 
eo- Of course, ^(e*) must be larger than 1. In the sequel, we shall see that in the joint source- 
channel coding problem, the source and the channel constitute a mechanism which is highly parallel 
to that of equilibrium energy-sharing between two subsystems in a heat bath, where the subsystem 
corresponding to the channel has a negative effective thermodynamic entropy in this sense. 

We should comment that in order to determine the energy sharing between the two subsystems 
in the above discussion, it was not necessary to consider how they thermally interact with each other 
and to go through the weighted suprcmal convolution between their entropies, as we did. We could 
have determined these energies simply by considering the equilibrium of each one of the subsystems 
individually with the heat bath,^ thus equating the derivative of each one of the entropy functions 
to p. Nonetheless, we have deliberately chosen to present the supremal convolution because in the 
sequel, it is this relation that will lead to the derivation of the mutual information in the joint 
source-channel coding problem. 

4 Formulation, Main Results and Discussion 

Consider an information source, Si,S2,- ■ whose symbols {Si} take on values in a finite alphabet 
S. The source is characterized by a sequence of probability distributions, P{s), s = (si, . . . ,sj\i), 
where = 1,2, . . .. Consider next a discrete memoryless channel (DMC), which is characterized 
^When doing so, the other system then becomes part of the heat bath anyway. 
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by a matrix of single-letter transition probabilities x e X, y G y}, where X and y are 

finite alphabets. The operation rate of the channel relative to the source is A channel uses per 
source symbol, which means that while the source produces an iV-vector s = {si, . . . ,sn) £ , 
the channel conveys n channel symbols, namely, it receives an n-vector x = {xi, . . . ,Xn) G and 
outputs an n-vector y = (yi, . . . ,?/„) G 3^", where n = XN. The parameter A is referred to as the 
bandwidth expansion factor of the channel relative to the source. 

For the sake of convenience in drawing the analogy with statistical mechanics, we will think 
of both the source and the channel as Boltzmann distributions with certain Hamiltonians at a 
certain common inverse temperature /3, that is, P{s) is proportional to e-xp{—P£s{s)} and W{y\x) 
is proportional to exp{—f5£cix, y)}, where Ss{-) and £c{-, •) are the Hamiltonians of the source and 
the channel, respectively. For a pair of n-vectors x and y, we will denote VF(y|a;) = 111=1 ^(yi|^i)i 
and keep in mind that it is proportional to exp{—p£c{x,y)}, where £c{x,y) = J27=i ^c{xi,yi)- 
Clearly, there is no loss of generality in this representation of the source and the channel since there 
is always at least one way of doing this: For example, one can simply take P = 1, £s{s) = — In P{s), 
and £c{x,y) = — \nW{y\x). The point is, however, that by doing this we have slightly extended 
the scope: instead of one source and one channel, we are actually considering a family of sources 
and channels, both indexed by a common parameter (3, that controls the degree of uniformity or 
skewedness of the distribution. 

An (iV, n) joint source-channel code, for the above defined source and channel, is a mapping from 
the set <S^ to -Y". Every source string s is mapped into a channel input vector x = {xi, . . . ,Xn), and 
when we wish to emphasize the dependence of £C on s, we denote it as x{s). The code is assumed to 
be selected at random, where for each s, the codeword x{s) is drawn under a distribution^ M{x), 
independently^ of all other codewords. The receiver estimates s by applying a certain function on 
the received channel output sequence y = {yi,. . . ,yn), i-c, it implements a function from y"^ to 
5^, which will be denoted by s = s{y). In some applications, the receiver (or the observer) may 
not necessarily attempt at full-fledged decoding of the message, but may opt to merely estimate a 

*A more general model would allow a distribution M that depends on s. For example, if can be naturally 
divided into type classes (like in te case of memoryless sources, Markov sources, etc.), then it is plausible to let M 
depend on the type class of s. However, among all sequences in , the important ones arc those that arc typical to 
the source (others can be ignored in the large A'" limit), which are equiprobable in the exponential scale, and so, the 
distribution M for all of them can be taken to be the same without loss of asymptotic optimality. 

®The independence assumption is made here mostly for the sake of simplicity. It can be somewhat relaxed as long 
as the concentration properties specified below continue to hold. 
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certain function of the source sequence (e.g., some statistic such as its composition). 

Our study of the mutual information induced by the joint source-channel code will be strongly 
based on the posterior distribution, which, for a given (randomly selected) code, is defined as: 

P{s)W{y\x{s)) 



ei,v{-ms{s) + £c{x{s),y)]} ^^^^ 



Es' eM^^sis') + Sc{x{s'),y)]} ' 
On a technical note, observe that since the posterior distribution is given by a ratio, this allows 
slighlty more freedom in the definition of the Hamiltonians £s and Sc^ as certain common constants 
in the numerator and the denominator may cancel each other. For example, if the source is binary 
and memoryless, as described in the example given in the Introduction, then P{s) is proportional 
to exp{— (^ In ^-^) Sjjj and so one can define £s{s) to be proportional to J^fLi Si, where the 
factor ^ In can be split between a part that is absrobed in the Hamiltonian itself and a part that 
is attributed to the inverse temperature parameter /3. A similar comment applies to the channel, 
but here some more caution is required since, in general, the constant of proportionality that relates 
W(2/|£c) and cxp{—P£c(x,y)} may depend on x, unless the code is of constant composition and/or 
the channel is symmetric in the sense that J^y exp{— /^^^(^(x, y)} is independent of x for all P (which 
is the case, e.g., in modulo-additive channels, like the BSC). If neither of these conditions hold 
(i.e., if the code is not constant composition and the channel is not symmetric), we keep the choice 
£c{x,y) as being proportional to — \nW{y\x). 

For a given choice of the Hamiltonians £s and £c, in view of these considerations, let us define 
the joint source-channel partition function as the denominator of the posterior distribution, i.e., 

Zi(3\y)= J2 eM-P[Ss{s) + £cix{s),y)]}. 
ses^ 

In the course of studying the properties of a typical realization of the joint source-channel partition 
function, pertaining to a given code ensemble, we will make a few observations, which were already 
mentioned briefly in the Introduction: 

1. Similarly as results that have already been observed in the context of the pure channel coding 
problem [12], the statistical-mechanical system pertaining to Z{P\y) undergoes a phase tran- 
sition, which corresponds, in the realm of coded systems, to the transition between reliable 
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and unreliable communication, namely, the point at which the entropy rate of the source 
exceeds the mutual information between the input and the output of the channel. 

2. When identifying the set of source vectors {s} that dominates Z{j3\y) (i.e., those that 
contribute most to Z{P\y)) above the phase transition temperature, one observes a situa- 
tion that parallels that of thermal equilibrium between two physical subsystems, one corre- 
sponding to the source and the other corresponds to the channel. To be more specific, if 
S{s,y) = Ss{s) + £c{x{s),y) is thought of as the total 'energy' shared by the source and 
the code/channel, then the dominant messages {s} split this total average energy between 
the source and the channel components in a way that corresponds to thermal equilibrium 
between the two parallel physical subsystems. 

3. The balance between the thermodynamical entropies of the two physical subsystems that lie 
in equilibrium, as described in item no. 2, is identified with the simple relation between the 

corresponding Shannon entropies of the source, namely, the unconditional source entropy and 
the conditional entropy given the channel output, whose difference is the mutual information 
between the source and the channel output. This gives rise to a simple formula of the mutual 
information rate induced by a typical code in the ensemble. 

In analogy to the definitions and the assumptions outlined in Section 3, we now make a few 
definitions and assumptions concerning the joint source-channel coding model. 



A.l Defining 

our first assumption is that 



{s G 5^ : (e - d/2)N < £s{s) < (e + ^2)^^} 



Es(e) = lim lim 



exists and that Ss(e) is a differentiable concave function. 
A. 2 For a given y, define 

<PnA^\y) = \ lnPr{n(e - ^2) < v) < n(e + 5/2)}, 

where the random vector X is drawn under the random coding distribution M, independently 
of y. Then, our second assumption is that for all e > 0, lim5^ohm„^oo E{(pn,s(,^\Y)} tends 
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uniformly to a differentiable function ^(e), where the expectation E is w.r.t. both the random 
selection of the codebook and the random actions of the source and the channel. Moreover, 
we assume that lim^—^o linin— +00 4'n. 

,sie\Y) tends ^(e) uniformly almost surely. 

A. 3 Let S5(e) and (?!)(e) be defined as above, and let So(e) be defined by the weighted supremal 
convolution 



So(e) = max ^ , . , ^ r > 

0<e'<(l+A)e L 1 + A l + A'^V A 

Our third assumption is that So(e) is a concave function throughout the range of e where it 
is non-negative. We now define 



S(e) 



So(e) So(e)>0 
-00 So(e) < 



As we shall see below, while So(e) gives the logarithm of the expected number of configurations 
with total energy e, the function S(e) gives the number of such configurations for a typical code 
in the ensemble. To see this, note that if S5(e') + A0([(1 + A)e — e']/A) < for all e', this means 
that for every e' the product of the number of configurations {s} for which Ss{s) is about ne' 
and the probability that a randomly chosen codeword would provide the complementary energy 
([(1 + A)e — e']/A, is less than one, which means that there is a very low probability to find any 
configuration with total energy e, and so, S(e) which is the normalized logarithm of the number of 
such configurations (i.e., the thermodynamical entropy of the combined system) is equal to —00 for 
a typical code realization. Note that the concavity of Sq (e) across the range where it is non-negative 
implies that E(e) is concave as well. 

In analogy to the discussion of the previous section, let us define 

Z5(/3) = ^exp{-/3^5(5)}. 
s 

Then, 

V'5(/?)= lim ^ In ZsiP) 
N—*oo iv 

and 5]5(e) are a Legendre-transform pair. Since '^si') is assumed concave, then the inverse trans- 
form relation 

Ss(e) = inf[/?e + V5(/9)], 
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holds true as well, and so the derivatives /35(e) = dEs/de and es{/3) = —dtps/d/3 are inverses of 
each other. It follows then that the Shannon entropy rate H{S) of P{s) (which depends on (3) 
agrees with Ss(e) whenever P and e are related hy (3 = (3s{e), or equivalently, e = es{P)- 

Referring to the partition function Z{P\y), let us distinguish between the contribution of 
the actual realization of the true sequence that the source actually emitted Sq, i.e., Zc{f3\y) = 
e^p{—(3[£s{so) + £c{x{so),y)]} and the contribution of all other (erroneous) source vectors 

Ze{m= E exp{-P[£s{s)+£cix{s),y)]}. 
Now, In Zc{P\y) is typically around -[E{£s{S)} + E{£ciX{S),Y)}]. As for Ze{p\y), let us define 



^N,d{e\y) 



{s^so: N(l + \)ie-6/2)<£sis) + £cix{s),y)<Nil + X)ie + 6/2)} 



Then, similarly as in the previous section, one readily observes that for S' = S{1 + A), we have: 

nN,5'{^\y) = E^S((i + V2)'5)x 

i>o 

Pr{Af(l + A)(e - 6'/2) ~ Nij + 1),5 < £ciX, y)) < N{1 + A)(e + S'/2) - Nj5} 
= E <k(j + ym exp{n0„,5([(l + A)e - (j + 1/2)6]/ \\y)} (11) 

Taking logarithms of both sides, dividing by + n = N(l + A), letting N grow without bound, 
and finally letting 6 go to zero, we obtain^ that: 

^.^ lnnN,S'{e\Y) a^s. f So(e) So(e) > 
N^oo N{l + \) [-00 So(e)<0 

but the r.h.s. is exactly S(e). Thus, as explained earlier, i;(e) is the thermodynamical entropy 
associated with the combined source-channel system. The concavity of I](e) then implies that 
it agrees (after the appropriate sealing) with the conditional Shannon entropy rate of the source 
given the channel output, H(S\Y), i.e., the entropy rate pertaining to the sequence of conditional 
probabilities P{s\y) defined above. For a given e in the range where E(e) is finite, let e' = e* achieve 
the supremum defining S(e). 



"At this point, we are using the fact [12], [11] that for an ensemble of independently selected codewords, the 
number of codewords which contribute energy £c{X,y) ^ n[{l + \)e — e']A, is with very high probability zero, if 
Es(e') + A(;6(l + A)e-e']/^) < and around exp{iV[Es(e') + A</>(1 + A)e - e']/A)} iiT.s{e') + X<j>{l + X)e- e']/X) > 0. 
The assumption of independnent codewords can be relaxed as long as this concentration property continues to hold. 
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At this point, one should distinguish between two situations: In the first situation, e is on 
the boundary of the range where S(e) is finite and positive, namely, S(e) = 0. In this case, the 
partition function Z{P\y) (and hence also Pp{s\y)) is dominated by a subexponential number of 
configurations {s} and so, the entropy rate H{S\Y) = 0, which means that the system is frozen 
in its glassy phase (cf. [12], [11] and references therein.) In the second situation, e is an internal 
point of the range where S(e) > 0, where we will also assume that e* G (0, (1 + A)e), which is 
the paramagnetic phase (or the disordered phase) of Ze{P\y). Then, the derivative of the function 
being maximized vanishes, i.e.. 



d(l>{e") 



de" 



= 

e"=[(l+A)e-e*]/A 



^'sin = 4>' { il±4i^ ) , (12) 



de' 

or equivalently, 

where T,'g and (f)' denote the derivatives of Tis and cf), respectively. As before, eq. (12) gives rise 
to thermal equilbrium between the physical system corresponding to the source and the one that 

pertains to the code/channel. Next observe that the left-hand side is exactly /3s (e*). Thus, 

:i + A)e-e*' 



Ps{e*) = 4>' 



A 



which means that given the value of the total per-particle energy e, we can find how the dominant 
codewords split the energy between the source and the channel: we can solve the above equation 
with the given e, with e* as an unknown. Then, the source contribution will be e* and the channel 
contribution will be [(1 + A)e - e*]/A. 

The discussion above holds for every value of e for which S(e) > 0. The dominant value of e is 
eo, the one that achieves E{lrLZ{l3\Y)}/[N{l + A)] for large N, in other words, the achiever of: 

m=l^ ^^|^^=-p[I](e)-/?e]. 

Thus, the dominant value of e, which is relevant for the previous paragraph, is eo, which in turn 
depends only on /3. But since S is assumed concave, then and S are also a Legendre-transform 

pair, and so eq and (3 arc related via the derivatives, eo = e(/9) = — ^'(/3) and /3 = /3(e) = S'(e), 
where again, primes denote the derivatives. In summary, given eo = e(/3) and e* = es{P)- Thus, 
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Ps{^*) in the equilibrium equation is Ps{^s{P)) = P since Ps{-) and es{-) are inverses of one another. 
Thus, the equihbrium equation apphed to the dominant energy eo becomes 



If, in addition, (p is concave, then (f)' is monotone, and thus has an inverse, which is given by the 
negative derivative of the Legendre transform of 0, that is, by the derivative of 

at) = sup[0(e) - et] 

e 

and then 

Now observe that if, for a typical y, either Zc{P\y) dominates Ze{P\y), or Ze{P\y) is in its frozen 
phase, then H{S\Y) vanishes, and so the mutual information rate limjv-»oo Y)/N = H{S). For 
the complementary case, our main result is the following: 

Theorem 1 Let E{I(S;Y)} denote the expected mutual information, where the expectation is 
taken w.r.t. the ensemble of of joint source-channel codes. Then, under Assumptions A1-A3: 

provided that 5](eo) > 0. 

Remark: Prom the above discussion, it is apparent that this result applies also to the almost-sure 
limit of I{S;Y)/N w.r.t. the code ensemble. 

Proof. 

lim = HiS) - HiS\Y) 

N—>co iV 

= Ss(e*)-(l + A)S(eo) 
•(l + A)€o-e 



A 

= -x4>{-c'm. □ (13) 

Discussion. We obtained then a very simple formula which depends solely on the random coding 
distribution. But what is the meaning of C'(/3)? Since —(pie) is, in fact, the large deviations rate 
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function for the event £c{X,y) < ne, and (^{t) is its Legendre transform, then it must be the 
almost-sure Umit of the log-moment generating function, that is 



C{t) hm - In ^ M(x)e-*^^(*'^) 



n— *oo n 

where, as defined above, M is the random coding distribution that governs each one of the inde- 
pendent, randomly selected codewords. Thus, 

-C'(B) ^='- hm - . ^xM{x)S,{x,Y)e-^^o(^y)} 
n^oon J2xMix)e-l^£c{x,Y) 

But the Boltzmann weight e~^^'^^^'y^ is proportional to W{y\x), and so, — C'(/?) is exactly the 
asymptotic almost-sure normalized conditional expectation of the energy, lim^^co ^i^ci^: 
stemming from the action of the channel on the message x{sq) that was actually transmitted. This 
quantity in turn is assumed to concentrate about its mean which is lim„^oo -£'{'^^c(^) ^)}/^- 
Thus, Zg(f3\y) and P{s\y) are dominated by (erroneous) sequences {s} whose normalized en- 
ergy eo consists of a source contribution e* = limN^oo E{£siS)} /N , and a channel contribution, 
[(1 -I- A)eo — e*]/A that agrees with the normalized energy generated by the noise, i.e., it agrees with 
lim„_>oo E{£c{X, Y)}/n, where X and Y are related via the channel W. Moreover, this is also the 
typical energy composition of the true message Sq that was actually transmitted (cf. the definition 
of Zc{f3\y). Thus, the above conclusion holds true regardless of whether or not the entropy rate of 
the source is smaller (in which case sq dominates Z{f3\y)) or larger than A times the normalized 
mutual informtion between X and Y (in which case, erroneous messages dominate Z(f]\y) for a 
typical y). We have already seen this behavior in the special case of the binary source and the 
BSC. 

Example 1. Suppose that the channel is BSC and codewords arc generated by fair coin tossing. In 
this case, VF(2/|ic) is proportional to cxp{— /3iSc'(ic, y)}, where £c{x,y) is the Hamming distance 
and /3 = In In this case, (^(e) = /12(e) — In 2 whose derivative is ^'(e) = In and so, —C,'{j3), 
the inverse of ^'(e), is given by -C'(/?) = 1/(1 + e^) = P- It follows then that if, in addition, 
the source is binary and memoryless with a parameter g, then P{s\y) is dominated by vectors 
{s} whose energy is as described in the Introduction. Also, the normalized mutual information is 
— A(/)(— C'(/9)) = —\(t){p) = A(ln2 — h2{p)). Somewhat more generally, let each coordinate Xi{s), 
i = 1, . . . ,n, of each codeword be drawn i.i.d. with probabilities Pr{Xi{s) = 1} = 1 — Pr{Xj(s) = 
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0} = m. Then, it is easy to show (using the method of types [2]) that 

-(t){p) = min [I{X;Y) + D{Px\\M)\, Y ~ Bernoulh(m * js), 

{Px\Y- Ed{X,Y)<p} 

where m*p means the binary convolution between m andp (i.e., m*p = m(l— p)+p(l— m)), d(-, •) is 
the Hamming distance and Px is the marginal of X induced by Y (which is Bernoulli(m*p)) and the 
reversed channel Px\y to be optimized. By eliminating the divergence term, we are lower bounding 
— (pip) by the rate-distortion function of Y at Hamming distortion p, which is h2{m * p) — h2{p). 
On the other hand, returning to the original minimization problem, by selecting Px\y (instead of 
minimizing over Px\y) to be the reverse channel induced by M and Wy\x (which is the BSC(p)), 
we are getting the same quantity also as an upper bound. Thus, —4>{p) = h2{'m*p) — /i2(p), and 
so, 

hm ^I^^}Xl = X[h2{m*p)-h2(p)]. 

N—*oo iV 

Comment: An alternative view on the derivation of the asymptotic mutual information rate between 
S and Y comes from the following chain of equalities: 

P{Y\S)' 



lim E <ln , , 
JV-.00 1 P{Y) 

hm ^E{lneM-P£c{X{S),Y)}}- 

iV— >(X) iv 



lim — E < In 



exp{-P[£s{s)+£c{X{s),Y)]} 
(1 + A)eo - e* 



/3[(1 + A)eo - e*] + MP) " ^sie*) - \<P 

(l + A)eo-e* 



A 



•(l + A)f„-c*) 



A 



A 



+ /3(l + A)eo 



(14) 



where we have used the fact that the summation over s is dominated by configurations with per- 
particle energy eo, which is allocated as e* and [(1 + A)eo — e*]/A. 

5 Application to the Wiretap Channel 



In this section, we demonstrate how our results apply to the wiretap channel. Wyner, in his well- 
known paper on the wiretap channel [14], studied the problem of secure communication across a 
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degraded broadcast channel, without using a secret key, where the legitimate receiver has access 
to the output of the good channel and the wiretapper receives the output of the bad channel. 
In that paper, Wyner characterized the optimum trade-off between reliable coding rates and the 
equivocation at the wiretapper, which was defined in terms of the conditional entropy of the source 
given the output of the bad channel, observed by the wire-tapper. 

Consider a DMS P as before, and a cascade of two finite alphabet DMC's: followed 
immediately by Wz\y-i both^ operating at a relative rate of A channel symbols per source symbol. 
The source s G is encoded to a channel input vector x{s) G Af", n = \N, and then transmitted. 
A code for the wire-tap channel should be designed in a way, that on the one hand, the legitimate 
receiver is required to estimate the source s from the output y G 3^" of the channel Wy|x within 
an arbitrarily small probability of error, whereas on the other hand, the eavesdropper, which has 
access to z G Z^, should be able to learn as little as possible about the source in the sense that 
the asymptotic equivocation, A = limswpj^^^ H{S\Z)/N, should be as large as possible. Wyner 
showed [14] that the largest achievable value of A is given by XT{H{S)/X), where 

In particular, the secrecy capacity Cg, which is the solution to the equation R = r(i?), is the 
rate at which the potential secrecy that the wiretap channel can offer is fully expoilted: If the 
entropy of the source, H{S)/X is less than or equal to Cg (supposing that A can be chosen in such 
a way), then the coding scheme of [14] that asymptotically achieves works as follows: Let X* 
be the random variable X that achieves T{R), for some R in the range H{S)/X < R < Cg, and 
let Y* and Z* be the corresponding outputs of the two channels. We first compress the source S 
to its entropy, and then apply channel coding so that the good receiver can still decode reliably 
for large N and n, but the bad one cannot. Now, since H{S)/X < Cg, then by the definitions of 
r(-) and Cg, I{X*;Y*) > H{S)/X + I{X*;Z*). Accordingly, the channel codebook is composed 
of about e-^^^'^) = e^^^^^^^ bins (one for each typical source sequence), each of size slightly less 
than e"^^"'^*'-^*). The codeword actually transmitted is randomly chosen among all codewords of 
the bin pertaining to the index of the compressed source sequence. Note that the eavesdropper 
could have decoded the message had it been informed of the bin which the transmitted codeword 



^The notation of the output of the second channel, Z, should not be confused with the notation of the partition 
function since we do not refer the partition function in this section. 
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belongs to since the rate of the bin, as said, is (slightly) less than I{X*; Z*). The idea then is that 
this information is irrelevant since it is independent of the source vector, and so it does not help 
the eavesdropper in learning anything about the source. Indeed, if we represent the transmitted 
codeword x as f{c{s),u), where c(s) stands for the bit string of the lossless compression of s, 
indicating the bin index using nH{S)/X nats, and u as an independent random bit string of length 
nI{X*; Z*) nats, then we have the following: One the one hand, 

H{X\Z) < H{c{S),U\Z) = H{c{S)\Z) + H{U\Z,c{S)) 

where the term H{U\Z,c{S)) essentially vanishes since, as mentioned above, every bin forms a 
channel sub-code that is reliably decodable by the eavesdropper. On the other hand, 

H{X\Z) = H{X) - I{X; Z), 

thus the equivocation achieved is: 

H{S\Z) > H{c{S)\Z) ~ H{X) - I{X- Z) 

where the first term in the r.h.s. is essentially n[H{S)/\ + I{X*\Z*)\ and the second term, which is 
a mutual information induced by a code above capacity, can be evaluated using our above results, 
provided that the channel code is randomly selected from an ensemble that satisfies our assumptions. 
For example, if the codewords are chosen i.i.d. according to the distribution of X* , then I{X\ Z) 
is approximately nI{X*; Z*), and then full secrecy is achieved as H{S\Z)/N is essentially equal 
to H{S). Nonetheless, since the rate of the code [H{S)/\ + I{X*;Z*) is less than I{X*;Y*), 
the legtimate decoder can still decode reliably. Out results can be used also to assess the secrecy 
achieved by random varlaibles other than i.i.d. according to X*, while ensuring that the good 
decode can still decode reliably. 

6 Extension to Multiuser Settings 

The above ideas can be extended in a natural manner to multiuser communication situations, and 

in this section, we demonstrate this for the multiple access channel (MAC), where the underlying 
principle is again thermal equilibrium between the subsystems pertaining to the different users and 
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that of the channel. As before, our focus is on the regime where reUable communication cannot 
hold (the paramegnetic phase). 

As an example, consider a randomly selected joint source-channel code for a MAC with two 
users, in the following setting. We are given two independent sources, 51,52,... and Ti,T2,... 
governed by probability distributions Ps{-) and Pt{-), which are proportional to exp{—f3£s{-)} 
and to exp{— with partition functions Zs{P) and Zt{P), respectively. Each A?^-vector of 
the first source s = (si, . . . , sjv) G is encoded into a channel input vector xs{s) G Xg and 
each N-veciOT of the second source t = {ti,. . . ,tN) G is encoded into a channel input vector 
xxit) G Xj!. Both codebooks are selected independently, where each codevector of the first code 
is chosen independently according to distribution Ms and each codevector of the second codebook 
is selected independently according to distribution Mt- Both codewords are fed into a memoryless 
MAC W(y\xs,XT)- which is proportional to exp{— f3£c' s , xt , y)} ■ If we wish to estimate the 
mutual information EI(S, T; Y) induced by the code, this is quite a trivial extension of the former 
derivation. But what about EI{S;Y)7 

Here, it will be more convenient to adopt the alternative derivation of eq. (14). Considering the 
partition function 

ZiP\y) = J2eM-P[£sis) + £T{t) + £cixsis),XTit),y)]}, 
s,t 

let eg, e^, and denote the dominant energies allocated to the source S, the source T, and 
the MAC, respectively. Also, for a typical randomly chosen codeword Xg{s) of the source mes- 
sage s actually transmitted, let us define e"''^"'''^^!^'^^^)'^) as the probability (under Mp) that 
£c{xs{s),XT,y) is between n{e — 5/2) and n(e + (5/2), for given xs{s) and y, and assume that as 
n ^ CO and then (5^0, 0„^5(e|a;s(s), y) tends uniformly almost surely to a certain function which 
will be denoted by (f){e\S). Now, 

hm ^^(^'^) = liin ^E{\nP{Y\S)]- Yim ^E{\nP{Y)] 

AT— >oo iv AT— >oo iV AT— ►oo iV 



lim —E{\n 

N-^oo N 



lim —E ( In 

N->-oo N 



^ J2^xp{-f3[£Tit)+£c{XsiS),XT{t),Y)]} 



MP) t 
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£T{t)+Sc{Xs{s),XT{t),Y)]}]} 

= MP) + ^T{e*T) + m^hlS) - P{eh + 4) - Sr(eT) - 

Es{e*s)-XcP{e*c\S)+P{e*s + e*T + eh) 
= X[m\S)-cl>ie*c)] (15) 

The last line of the above chain of equalities can be intuitively explained as follows: The term 
— A0(e^) stands for limN^oo EI {S,T;Y)/N, because of the same reasoning as before (if we look 
at the pair {S,T) as one entity). The term A(^(e^|S') corresponds to the conditional mutual 
information rate lim7v-+oo EI{T; Y\S)/N since the true S is given and only the random codeword 
of T is selected. Thus, by the chain rule of the mutual information, the difference gives the mutual 
information rate between S and Y. 



Example 2. Consider the binary modulo-2 additive MAC, Y = Xs © Xt © V, where all variables 
take on values in {0, 1}, © denotes addition modulo 2 (XOR), and V is Bernoulli with parameter 
p = Pic{V = 1}, independent of Xt and Xs- Similarly as in Example 1, let the codebooks of the 
two users be generated by i.i.d. distributions with parameters ms and uit, respectively. Now, as 
before, Eq = p and the probability that Xg ® Xt, whose components are Bernoulli(m5 * my), 
would fall within distance np from a typical y, whose components are Bernoulli(m5 * rriT * p), 
is exponentially e"[''2(p)-'»2(ms*mT*p)]^ thus ^{p) = ^2(^5 * rriT * p) - h2{p). On the other hand, 
the probability of the same event conditioned on 0:5, is the probability that Xt would fall within 
distance np from y © xg = xt © v (which has Bcrnoulli(mr * p) components), and thus is of the 
exponential order of e"'?^(Pl'^) = e^'[''2(p)-/i2(mT*p)]_ follows then that 

Eli S' Y^ 

lim — ^ = \[h2{ms * rriT * p) — h2{mT * p)]- 

i(S-Y) 

In the special case where uit = 1/2, we get limjv-»oo -jy^ — = regardless of ms, in agreement 
with intuition, as Xt behaves like Bernoulli(l/2) noise in the paramagnetic regime. 
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